AITopics | stationary distribution

AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation

Neural Information Processing SystemsMay-1-2026, 04:57:48 GMT

One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) setting since the joint action space grows exponentially with the number of agents. To avoid this curse of dimensionality, existing MARL methods adopt either value decomposition methods or fully decentralized training of individual agents. However, even when combined with standard conservatism principles, these methods can still result in the selection of OOD joint actions in offline MARL. To this end, we introduce AlberDICE, an offline MARL algorithm that alternatively performs centralized training of individual agents based on stationary distribution optimization. AlberDICE circumvents the exponential complexity of MARL by computing the best response of one agent at a time while effectively avoiding OOD joint action selection. Theoretically, we show that the alternating optimization procedure converges to Nash policies. In the experiments, we demonstrate that AlberDICE significantly outperforms baseline algorithms on a standard suite of MARL benchmarks.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

e4da3b7fbbce2345d7772b0674a318d5-Paper.pdf

Neural Information Processing SystemsApr-30-2026, 22:55:46 GMT

Add feedback

Expressive probabilistic sampling in recurrent neural networks

Neural Information Processing SystemsApr-28-2026, 13:09:58 GMT

In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to explore the minimum architectural requirements for recurrent neural circuits to sample from complex distributions. We first consider the traditional sampling model consisting of a network of neurons whose outputs directly represent the samples (sampler-only network). We argue that synaptic current and firing-rate dynamics in the traditional model have limited capacity to sample from a complex probability distribution. We show that the firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We call such circuits reservoir-sampler networks (RSNs). We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling. We empirically demonstrate our model's ability to sample from several complex data distributions using the proposed neural dynamics and discuss its applicability to developing the next generation of sampling-based Bayesian brain models.

artificial intelligence, machine learning, score function, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

958c530554f78bcd8e97125b70e6973d-Paper.pdf

Neural Information Processing SystemsApr-26-2026, 17:01:32 GMT

arxiv preprint arxiv, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Convergence of Actor-Critic Methods with Multi-Layer Neural Networks

Neural Information Processing SystemsApr-25-2026, 15:44:47 GMT

The early theory of actor-critic methods considered convergence using linear function approximators for the policy and value functions. Recent work has established convergence using neural network approximators with a single hidden layer. In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice. We show that actor-critic updates projected on a ball around the initial condition will converge to a neighborhood where the average of the squared gradients is O(1/ m)+O(ϵ), with mbeing the width of the neural network and ϵthe approximation quality of the best critic neural network over the projected set.

artificial intelligence, machine learning, min 2, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

21cb5931c39d7bd21b34b3b8f14a125c-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:52:04 GMT

artificial intelligence, machine learning, non-daleian network, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.69)

Add feedback

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

Neural Information Processing SystemsApr-24-2026, 18:30:18 GMT

In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after T repetitions of the game is bounded by O(logT), improving over the prior best bounds of O(log4(T)). At the same time, we guarantee optimal O( T) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning rate, the second-order path lengths of the dynamics up to time T are bounded by O(logT), a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way optimistic regularized learning with the use of self-concordant barriers. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).

artificial intelligence, machine learning, swap regret, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

14f75513f0f1ca01de1e826b52e6b840-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 18:11:55 GMT

artificial intelligence, exp, operator, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

0d441de75945e5acbc865406fc9a2559-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 16:11:12 GMT

A.1 Connection to online learning In Section 2 we motivated the update (2) as a way to adjust the size of our prediction sets in response to the realized historical miscoverage frequency. Alternatively, one could also derive (2) as an online gradient descent algorithm with respect to the pinball loss. To be more precise let t:= sup{: Yt 2 Cˆt()}, where we remark that Cˆt( t) can be thought of as the smallest prediction set containing Yt. Because the pinball loss is convex, this gradient descent update falls within a well understood class of algorithms that have been extensively studied in the online learning literature (see e.g. Unfortunately, this notion of regret fails to capture our intuition that t is adaptively tracking the moving target .

artificial intelligence, exp, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: